perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159
perf: Cache or narrow TestSuiteGate test discovery (avoid full rglob …#159shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
Conversation
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@refactron/verification/checks/test_gate.py`:
- Line 118: Remove the trailing whitespace in the test file (test_gate.py) that
is causing the pre-commit hook to fail: open the file and delete the trailing
spaces at the ends of the affected lines, ensure your editor is set to trim
trailing whitespace on save (or run a trim command), then re-run the pre-commit
hooks (or git commit) to verify the hook passes before pushing.
- Around line 112-113: The cache key currently uses module_name = file_path.stem
which can collide across packages; instead compute a module-scoped identifier
based on the file path relative to the search_root/project_root (use search_root
or self.project_root to derive file_path.relative_to(search_root) and strip the
suffix, converting separators to a dotted module path or a normalized path
string) and replace all uses of module_name in the cache lookup/write sites (the
cache read around where reuse occurs and the cache write where stale writes
happen) to use this new module_path; also defend against files not under the
search_root by falling back to file_path.as_posix() to avoid exceptions.
- Around line 21-24: The __init__ method in the TestGate class is missing an
explicit return type; add an explicit "-> None" annotation to the __init__
signature (the constructor defined as def __init__(...)) to comply with the
typed-def policy and mypy disallow_untyped_defs; update the signature only (no
body changes) so the method becomes explicitly typed.
- Around line 119-120: The current discovery narrows search_dirs to only
test_dirs when a tests/ or test/ exists, which misses colocated tests; modify
the logic in test_gate.py so that search_dirs includes both the found test
directories and the repository root for fallback (e.g., make search_dirs =
test_dirs + [search_root] with deduplication/ordering) instead of replacing the
root entirely, ensuring colocated tests like pkg/foo_test.py are still
discovered while retaining the fast-path of scanning tests/ or test/.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ef1b33d-49fc-433e-92ef-dc73a802ab3a
📒 Files selected for processing (1)
refactron/verification/checks/test_gate.py
| def __init__(self, project_root: Optional[Path] = None): | ||
| self.project_root = project_root | ||
| self._test_file_cache: Optional[Dict[str, List[Path]]] = None | ||
| self._all_test_files: Optional[List[Path]] = None |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add explicit -> None on __init__ to satisfy typed-def policy.
At Line 21, __init__ is missing an explicit return type, which violates the typed-def rule under refactron/.
Proposed fix
- def __init__(self, project_root: Optional[Path] = None):
+ def __init__(self, project_root: Optional[Path] = None) -> None:
self.project_root = project_root
self._test_file_cache: Optional[Dict[str, List[Path]]] = None
self._all_test_files: Optional[List[Path]] = NoneAs per coding guidelines, "Type annotations are required in refactron/ with mypy disallow_untyped_defs = true enabled."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, project_root: Optional[Path] = None): | |
| self.project_root = project_root | |
| self._test_file_cache: Optional[Dict[str, List[Path]]] = None | |
| self._all_test_files: Optional[List[Path]] = None | |
| def __init__(self, project_root: Optional[Path] = None) -> None: | |
| self.project_root = project_root | |
| self._test_file_cache: Optional[Dict[str, List[Path]]] = None | |
| self._all_test_files: Optional[List[Path]] = None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 21 - 24, The
__init__ method in the TestGate class is missing an explicit return type; add an
explicit "-> None" annotation to the __init__ signature (the constructor defined
as def __init__(...)) to comply with the typed-def policy and mypy
disallow_untyped_defs; update the signature only (no body changes) so the method
becomes explicitly typed.
| module_name = file_path.stem | ||
| search_root = self.project_root or file_path.parent |
There was a problem hiding this comment.
Cache key should include module path, not only file stem.
At Line 112, using file_path.stem as the cache key can collide across packages (a/utils.py vs b/utils.py),
causing incorrect reuse from Line 131 and stale writes at Line 145.
Proposed fix
module_name = file_path.stem
+ module_cache_key = str(file_path.resolve())
search_root = self.project_root or file_path.parent
@@
- if module_name in self._test_file_cache:
- return self._test_file_cache[module_name]
+ if module_cache_key in self._test_file_cache:
+ return self._test_file_cache[module_cache_key]
@@
- self._test_file_cache[module_name] = test_files
+ self._test_file_cache[module_cache_key] = test_files
return test_filesAlso applies to: 131-132, 145-146
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 112 - 113, The cache
key currently uses module_name = file_path.stem which can collide across
packages; instead compute a module-scoped identifier based on the file path
relative to the search_root/project_root (use search_root or self.project_root
to derive file_path.relative_to(search_root) and strip the suffix, converting
separators to a dotted module path or a normalized path string) and replace all
uses of module_name in the cache lookup/write sites (the cache read around where
reuse occurs and the cache write where stale writes happen) to use this new
module_path; also defend against files not under the search_root by falling back
to file_path.as_posix() to avoid exceptions.
| if self._test_file_cache is None: | ||
| self._test_file_cache = {} | ||
| self._all_test_files = [] | ||
|
|
There was a problem hiding this comment.
Remove trailing whitespace to unblock pre-commit.
Lines around Line 118/122/130/144 include trailing spaces, and pre-commit already failed on this hook.
Also applies to: 122-122, 130-130, 144-144
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` at line 118, Remove the trailing
whitespace in the test file (test_gate.py) that is causing the pre-commit hook
to fail: open the file and delete the trailing spaces at the ends of the
affected lines, ensure your editor is set to trim trailing whitespace on save
(or run a trim command), then re-run the pre-commit hooks (or git commit) to
verify the hook passes before pushing.
| test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()] | ||
| search_dirs = test_dirs if test_dirs else [search_root] |
There was a problem hiding this comment.
Discovery narrowing can miss valid tests in mixed layouts.
At Line 120, when tests/ or test/ exists, discovery skips all other directories. Repos with colocated tests
(e.g., pkg/foo_test.py) will be silently excluded, which can let regressions pass verification.
Safer fallback pattern (keep fast path, avoid false negatives)
test_dirs = [d for d in [search_root / "tests", search_root / "test"] if d.is_dir()]
search_dirs = test_dirs if test_dirs else [search_root]
@@
test_files: List[Path] = []
for py_file in self._all_test_files: # type: ignore
@@
if self._imports_module(source, module_name):
test_files.append(py_file)
except Exception:
continue
+
+ # Mixed-layout fallback: if canonical test dirs exist but yielded nothing relevant,
+ # do one broader pass to catch colocated tests.
+ if not test_files and test_dirs:
+ for py_file in search_root.rglob("*.py"):
+ if any(excluded in py_file.parts for excluded in excluded_dirs):
+ continue
+ name = py_file.name
+ if not (name.startswith("test_") or name.endswith("_test.py")):
+ continue
+ try:
+ source = py_file.read_text(encoding="utf-8")
+ if self._imports_module(source, module_name):
+ test_files.append(py_file)
+ except Exception:
+ continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@refactron/verification/checks/test_gate.py` around lines 119 - 120, The
current discovery narrows search_dirs to only test_dirs when a tests/ or test/
exists, which misses colocated tests; modify the logic in test_gate.py so that
search_dirs includes both the found test directories and the repository root for
fallback (e.g., make search_dirs = test_dirs + [search_root] with
deduplication/ordering) instead of replacing the root entirely, ensuring
colocated tests like pkg/foo_test.py are still discovered while retaining the
fast-path of scanning tests/ or test/.
solve #153
The latest update aggressively drastically optimizes the time TestSuiteGate spends dynamically discovering test coverage bindings, fixing a significant performance bottleneck scaling heavily with overall project sizes. Previously, the verification check blindly executed a full recursive search (rglob("*.py")) across the entire workspace structure for every single fix candidate, constantly rescanning bloated internal dependency folders (like .venv or node_modules) and redundantly parsing AST layouts on unchanged assets. We solved this by instituting three layers of restriction: first, we instituted strict semantic path exclusions to bypass generic, noisy directories entirely; second, the search mechanism now explicitly pivots to prioritize established structural endpoints like tests/ or test/; finally, we configured a persisting dictionary state-cache at the class level. By securely storing the mapped outcome of which files internally import which module branches, subsequent checks against identical files immediately short-circuit without re-triggering disk I/O, ensuring verification iteration speeds reliably parallel your actual test scope density instead of being bogged down by sheer repository weight.
Summary by CodeRabbit
Release Notes